Risk Minimization and Language Modeling in Text Retrieval – Thesis Summary

نویسنده

  • ChengXiang Zhai
چکیده

This thesis presents a new general probabilistic framework for text retrieval based on Bayesian decision theory. In this framework, queries and documents are modeled using statistical language models, user preferences are modeled through loss functions, and retrieval is cast as a risk minimization problem. This risk minimization framework not only unifies several existing retrieval models within one general probabilistic framework, but also facilitates the development of new principled approaches to text retrieval through the use of statistical language models. We explore three interesting special cases of the framework. In the case of a two-stage language modeling approach, we show that it is possible to achieve excellent retrieval performance without any ad hoc parameter tuning by exploiting statistical estimation methods to set the retrieval parameters completely automatically. In another case of a KL-divergence retrieval model, we demonstrate that it is possible to improve retrieval performance by using improved language models estimated based on feedback documents. Finally, in the case of non-traditional aspect retrieval models, we show that it is possible to use language models to capture redundancy and sub-topics in documents, and to perform “contextsensitive” ranking of documents based on both relevance and novelty of documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Risk Minimization and Language Modeling in Text Retrieval

With the dramatic increase in online information in recent years, text retrieval is becoming increasingly important. Although many different text retrieval approaches have been proposed and studied in the past decades, it is still a significant scientific challenge to develop principled retrieval approaches that also perform well empirically; so far, the theoretically well-motivated models have...

متن کامل

A Summary Writing Model Based on Van Dijk’s Concept of Macrostructure and its Application within the Genre-Based Approach

This study was an attempt to provide a comprehensive model for summary writing based on the model of Van Dijk’s concept of macrostructures. The effectiveness of the model was examined in a genre-based quasi-experimental study with the data collection procedure lasting a semester. The participants included 60 female English learners divided into two experimental and control groups. The results o...

متن کامل

Multi modal multi-semantic image retrieval

.................................................................................................................... viii ACKNOLWEDGEMENTS .................................................................................................... x ABBREVIATIONS ................................................................................................................. xi CHAPTER 1 INTRODUCTION ...

متن کامل

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...

متن کامل

Studying the Effect of Retrieval Direction during Reading on Productive and Receptive Knowledge of Vocabulary

Retrieval tasks provide learners with an opportunity to focus both on meaning and on form. There are four different retrieval directions. The present study aimed to identify the optimal direction of recall type retrievals during reading and to investigate the outcomes of each one. Forty-eight intermediate EFL learners took part in the study. One of the experimental groups was provided with the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002